Skip to content

Senseaudio voice picker followup#2260

Open
QWERTY0205 wants to merge 7 commits into
nexu-io:mainfrom
QWERTY0205:senseaudio-voice-picker-followup
Open

Senseaudio voice picker followup#2260
QWERTY0205 wants to merge 7 commits into
nexu-io:mainfrom
QWERTY0205:senseaudio-voice-picker-followup

Conversation

@QWERTY0205
Copy link
Copy Markdown

Fixes #

Why

What users will see

Surface area

  • UI — new page / dialog / panel / menu item / setting / empty state in apps/web or apps/desktop (including Electron menu bar)
  • Keyboard shortcut — new or changed
  • CLI / env var — new od subcommand or flag, new tools-dev / tools-pack / tools-pr flag, or new OD_* env var
  • API / contract — new /api/* endpoint, new SSE event, or changed shape in packages/contracts
  • Extension point — new entry under skills/, design-systems/, design-templates/, or craft/, or change to the skills protocol
  • i18n keys — added new translation keys (see TRANSLATIONS.md for the locale workflow)
  • New top-level dependency — adding any new entry to the root package.json (dependencies or devDependencies); workspace-package package.json files are out of scope. Include a paragraph on what we get vs. what bytes we ship (see CONTRIBUTING.md → Code style)
  • Default behavior change — changes what existing users experience without opting in (default model, default setting, file/SQLite schema, auto-network on startup, auto-install)
  • None — internal refactor, docs, tests, or translation update only

Screenshots

Bug fix verification

Validation

Fl0rencess720 and others added 7 commits May 18, 2026 13:59
Speech projects using `senseaudio-tts` had no way to discover the
voices a SenseAudio account can synthesise — the only escape hatch
was for the user to paste a raw voice_id into the New Project panel or
accept the daemon's default. Add an ElevenLabs-style picker so the
agent can present a dropdown of the account's available personas and
route to the right variant on dispatch.

Daemon
- `senseaudio-voices.ts` fetches `POST /v1/get_voice`, validates the
  base_resp envelope, and shapes the response into
  `Record<prefix, { name, description, variants }>` — the prefix
  (`male_0028`) keys 1:1 to a persona; colliding prefixes
  (`female_0030_*`) get keyed by full voice_id instead. The only
  metadata the API does not return — variant suffix → emotion label —
  is inlined as a `VARIANT_LABELS` const sourced from
  docs.senseaudio.cn. 10-min cache by api-key fingerprint. Shaping is
  wrapped in try/catch so an API field rename returns an empty
  catalogue (and the prompt falls back to the error path) instead of
  crashing the daemon.
- `GET /api/media/providers/senseaudio/voices` exposes the catalogue.

Web
- `apps/web/src/providers/senseaudio-voices.ts` mirrors the daemon
  shape with defensive normalisation.
- `ProjectView` wires the fetch through the BYOK compose path so both
  daemon and BYOK turns get the same catalogue.

Prompt
- New `senseAudioCatalogue` field on `ComposeInput` (contracts +
  daemon mirror). `renderSenseAudioPickerInstructions` emits a short
  bullet instruction, fixed `title` / `description` / `submitLabel`
  defaults the agent reuses verbatim (localised to the brief
  language), per-option label rules, post-submit variant-swap logic,
  and the catalogue JSON. Errors are sanitised through a
  `formatSenseAudioCatalogueErrorForPrompt` helper that classifies
  missing-key vs HTTP status code paths.
- Localisation lives in the agent: option labels and form copy get
  translated into the user's brief language at emit time; voice_ids
  stay verbatim.

Tests
- `senseaudio-voices.test.ts` covers shape conversion, prefix
  collisions, hardcoded variant labels, the `通用` fallback, the
  base_resp error envelope, caching, missing-credentials early exit,
  and an "API field rename returns empty catalogue" defence.
- `system-prompt-senseaudio-voices.test.ts` covers picker injection
  triggered by `audioModel=senseaudio-tts`, the sanitised error path,
  and the missing-key Settings hint.
…-picker

# Conflicts:
#	apps/daemon/src/prompts/system.ts
#	packages/contracts/src/prompts/system.ts
The variant suffix → emotion label map (e.g. female_0033_b → "开心")
is documented on docs.senseaudio.cn/guides/voice/catalog.md but not
returned by the /v1/get_voice API, so the original PR hardcoded a
50-line table that drifts every time SenseAudio adds a persona.

Replace the hardcoded table with a one-time per-process scraper:

  1. fetch docs.senseaudio.cn/guides/voice/catalog.md (24h cached)
  2. regex `<voice_id>` `(<label>)` across the page → labels map
  3. shapeCatalogue() now consults the scraped map first

Fallback chain when shaping each variant entry:

  primary    doc-scraped label                 (fresh, authoritative)
  secondary  BACKUP_VARIANT_LABELS hardcoded   (used iff doc fetch fails
                                                or yields zero matches;
                                                cached only 5min so the
                                                live doc is retried fast
                                                once it recovers)
  per-voice  voice_name from the API           (used iff a specific
                                                voice_id is missing from
                                                both label sources —
                                                never a static "通用"
                                                placeholder anymore)

Net effect against today's prod catalogue: doc surfaces 111 voice_id
labels vs 82 in the hardcoded backup, so 29 voices that previously
fell back to "通用" (female_0006_a "深情", male_0023_a "平稳",
male_0004_a "平稳", … includes热门 personas) now carry their real
label without anyone needing to update source code.
The picker dropdown showed all ~12 personas in catalogue order with
no UX hint about which ones actually fit the user's brief. The user
had to read every option's label end-to-end to decide.

Add a REQUIRED step in renderSenseAudioPickerInstructions: before
composing the dropdown, the agent scores each persona for fit against
the brief (gender, age, register, tone, scenario keywords), then
marks the top 3 with prefix glyphs included in the localised label:

  ★    nexu-io#1 best match
  ◆    nexu-io#2nexu-io#3
  (none for the rest)

Top-3 options sort to the front of the dropdown in 1→2→3 order; the
remainder follow in original catalogue order. Glyphs are universal
(not zh-CN-only) so the localisation rule for the rest of the label
keeps working unchanged.
Update the variant-label-fallback test to match the new behaviour
introduced in the scrape-from-docs commit: voice_ids missing from
both the doc-scraped map and the BACKUP_VARIANT_LABELS hardcoded
backup now fall back to the persona's voice_name, not the static
"通用" placeholder.

The fetch mock now serves a 404 for the docs URL so the daemon
deterministically takes the backup path during the test, which was
the implicit assumption in the previous version.
The previous prefix set (★ ◆ ◇) was too geometrically similar — the
filled and open diamonds were hard to tell apart at a glance, and the
black star + diamonds combo did not visually communicate "ranking".

Switch to the universal medal emojis. They map onto the gold/silver/
bronze metaphor users already recognise from sports, awards, and
leaderboards, and remain locale-neutral so the rest of the label can
still be translated freely.
@lefarcen lefarcen requested a review from Siri-Ray May 19, 2026 11:22
@lefarcen lefarcen added size/XL PR changes 700-1500 lines risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps type/enhancement Enhancement to existing feature labels May 19, 2026
Copy link
Copy Markdown
Contributor

@lefarcen lefarcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @QWERTY0205! Thanks for picking up the SenseAudio voice-picker follow-up — the changed areas make the intended direction visible, but the PR description is still mostly template placeholders.

Could you fill in ## Why, ## What users will see, ## Surface area, and ## Validation before pool review gets deep into this? This appears to touch UI plus daemon/API-contract voice discovery, so ticking the relevant surface-area boxes and adding the validation commands/screenshots will help reviewers scope the user-facing path quickly. Also, please replace the dangling Fixes # line with a real issue number or remove it if there is no linked issue.

Related: #2044 by @Fl0rencess720 is already open against the same SenseAudio voice-picker area and touches the same 12 files. You two may want to compare approaches; the maintainer team will decide which path lands.

Copy link
Copy Markdown
Contributor

@Siri-Ray Siri-Ray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QWERTY0205 thanks for the thoughtful follow-up on the SenseAudio picker. I found one BYOK/API-mode consistency issue to consider; it should be straightforward to fix by keeping the two prompt composers in sync.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

lines.push(' description: "Pick a voice for the read."');
lines.push(' submitLabel: "Use voice"');
lines.push('');
lines.push('For each dropdown option:');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contracts-side SenseAudio picker text is now missing the top-3 medal-ranking step that was added to the daemon composer in apps/daemon/src/prompts/system.ts (Top-3 highlighting, 🥇 / 🥈 / 🥉, and sorting the ranked options first). That matters because ProjectView imports composeSystemPrompt from @open-design/contracts for the web/BYOK compose path while daemon-mode runs use apps/daemon/src/prompts/system.ts, so SenseAudio projects behave differently depending on mode: daemon users get the ranked picker instructions, but BYOK/API-mode users only get the unranked dropdown guidance here. Please mirror the medal-ranking block in this contracts composer as well, and add/update a contracts prompt test that asserts the 🥇, 🥈, 🥉 prefixes so future follow-ups cannot drift the two prompt copies again.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

@lefarcen
Copy link
Copy Markdown
Contributor

Hey @QWERTY0205, quick lifecycle update: #2044 has now folded in this SenseAudio voice-picker follow-up, is approved on the current head, and covers the same files/scope we cross-linked above.

Unless you see a piece here that is still missing from #2044, it is probably best to close this PR as superseded so review effort stays focused there. Thanks again for pushing the follow-up ideas — they helped clarify the final scope.

@github-actions
Copy link
Copy Markdown
Contributor

@QWERTY0205 friendly reminder: this PR has been waiting on an author response for more than 3 days after reviewer or maintainer feedback.

When you have a chance, please reply here or push an update. To keep the queue manageable, PRs with no author activity for more than 5 days after feedback may be closed automatically, but they can be reopened when work resumes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps size/XL PR changes 700-1500 lines type/enhancement Enhancement to existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants